[00:00.000 --> 00:06.320]  Hi, I'm Philip Stark. Thank you for coming to this virtual talk. I'm going to talk about testing
[00:06.320 --> 00:11.600]  ballot marking devices. This is joint work with an undergraduate student, Ron Shi, who visited
[00:11.600 --> 00:18.600]  Berkeley for the last year and did some research with me. Why do we need to test ballot marking
[00:18.600 --> 00:23.380]  devices? Well, they can print votes that differ from what the voter saw on the screen or heard
[00:23.380 --> 00:29.280]  through the audio interface. The idea of voter verifiability is not really refined enough to
[00:29.280 --> 00:35.500]  capture the security properties that we need in a voting system. In particular, the ability to
[00:36.080 --> 00:41.520]  catch an error and spoil the ballot and request another opportunity to vote isn't enough to make
[00:41.520 --> 00:47.140]  ballot marking devices safe voting technology. For example, recent research by Bernard et al.
[00:47.140 --> 00:53.320]  showed that only 7% of voters notice errors that ballot marking devices have introduced
[00:53.320 --> 01:00.900]  into the printout. In effect, the security properties of paper are undermined by using
[01:00.900 --> 01:07.200]  ballot marking devices to mark the paper. There's a problem with the BMD security model in general.
[01:07.200 --> 01:11.920]  It basically makes voters responsible not only for their own errors, but also for the overall
[01:11.920 --> 01:17.060]  security of the system. But they don't give voters the tools they need to do that job. In particular,
[01:17.060 --> 01:21.600]  there's no way for a voter to present any other party, including an election official, with
[01:21.600 --> 01:28.000]  evidence that a BMD misbehaved. So if a voter complains to a local election official, there's
[01:28.000 --> 01:32.520]  no way for the election official to know whether the complaint reflects an actual malfunction,
[01:32.640 --> 01:39.180]  a voter error, or a cry of wolf trying to undermine the trust election. As a result of that error or
[01:39.180 --> 01:44.560]  malfeasance could change a lot of votes without raising any kind of alarm. A number of proponents
[01:44.560 --> 01:50.760]  of BMDs claim that they have benefits such as preventing overvotes, warning about undervotes,
[01:50.760 --> 01:56.420]  eliminating the possibility of ambiguous marks. I think that there's problems with those arguments.
[01:56.420 --> 02:00.780]  In particular, they assume that ballot marking devices function correctly. And there are many
[02:00.780 --> 02:05.320]  recent examples of failures on a wide scale, including in the state of Georgia, Northampton,
[02:05.320 --> 02:11.040]  Pennsylvania, and in Los Angeles. Precinct count optical scan systems can also protect against
[02:11.040 --> 02:17.820]  overvotes and undervotes. In fact, that's required under BBSG 1.0. So how can we figure out whether
[02:17.820 --> 02:22.540]  ballot marking devices actually worked adequately in a given election? We need to know that whatever
[02:22.540 --> 02:28.860]  errors occurred weren't numerous enough to change the outcome of any contest in the election.
[02:28.900 --> 02:32.580]  Three different approaches have been proposed to testing ballot marking devices.
[02:32.620 --> 02:37.940]  One is pre-election logic and accuracy testing, where you look at a machine before election day,
[02:37.940 --> 02:42.480]  run some test patterns through it, and verify that it prints the right thing. Another approach is
[02:42.480 --> 02:46.320]  passive testing, where you look at something like the spoiled ballot rate and try to detect
[02:46.320 --> 02:51.260]  anomalously high rates of spoiled ballots as a possible signal that the machines are misbehaving
[02:51.260 --> 02:55.840]  and voters are catching it. And the third approach is parallel or live testing, where
[02:55.840 --> 03:01.920]  testers periodically throughout the day, on election day, or during early voting period,
[03:01.920 --> 03:07.920]  will mark some ballots but not cast them, and verify that what's marked on the printout matches
[03:07.920 --> 03:12.620]  what their intent was. And the point of our research is to show that none of these, in fact,
[03:12.620 --> 03:18.540]  work in practice. Now, how much testing do we really need to do? That depends on how big a
[03:18.540 --> 03:24.300]  problem would make a material difference, and I have argued a long time that a sensible threshold
[03:24.300 --> 03:29.040]  for materiality is enough to change the reported winner of one or more contests. That is, we'd like
[03:29.040 --> 03:35.000]  to have high confidence that whatever errors occurred, they didn't alter who won. Many contests
[03:35.000 --> 03:41.160]  in the U.S. are decided by less than one percent, even statewide contests. For example, in 2016
[03:41.160 --> 03:48.160]  presidential election, the margin in statewide contests in Michigan, Rhode Island, Pennsylvania,
[03:48.160 --> 03:53.280]  and Wisconsin were all under one percent, with Michigan being as low as 0.22 percent.
[03:54.100 --> 03:59.380]  Now, I'm going to frame this as a two-person adversarial game and think about what strategies
[03:59.380 --> 04:04.740]  are available to the two players. So the evildoer is Mallory, who's trying to alter the outcome of
[04:04.740 --> 04:09.140]  one or more contests in an election. Mallory doesn't want to be detected. The point of this
[04:09.140 --> 04:13.740]  isn't to cast fear, uncertainty, and doubt. It's to get away with altering an election outcome.
[04:14.100 --> 04:20.060]  Mallory knows how the ballot marking devices will be tested in general because that will
[04:20.060 --> 04:24.240]  be a matter of public record. That is an action taken by the local election official.
[04:25.800 --> 04:33.800]  So Mallory knows the state history of the machine. Mallory knows how voters are
[04:33.800 --> 04:37.780]  interacting with it and knows what votes have been cast earlier in the day, how long each voting
[04:37.780 --> 04:43.540]  session took, and so on. And Mallory has a good model of voter behavior because Mallory can
[04:43.540 --> 04:47.580]  basically install spyware on voting machines and keep track of how voters interact with those
[04:47.580 --> 04:53.820]  machines in previous elections and on into the future. In contrast, Pat is our tester. Pat is
[04:53.820 --> 04:58.480]  trying to make sure that any ballot marking device problem that alters one or more outcomes will be
[04:58.480 --> 05:04.940]  detected. In contrast to Mallory, Pat has to obey the law, has to protect voter privacy. Pat doesn't
[05:04.940 --> 05:09.340]  know which contest Mallory will attack nor the strategy Mallory is going to use to attack them.
[05:09.340 --> 05:15.640]  So this is a very asymmetric problem. All right, so because the threshold for materiality depends
[05:15.640 --> 05:21.580]  on the number of votes that it takes to alter an election outcome, it's important to keep track of
[05:21.580 --> 05:30.300]  how big or how small elections are in the United States. The median turnout by county in 2017,
[05:30.300 --> 05:39.340]  not 3017, oh sorry, in the 3017 U.S. counties that there are in 2018 was a little under 3,000 voters.
[05:39.740 --> 05:45.620]  There are fewer than 43,000 voters in more than two-thirds of U.S. jurisdictions.
[05:45.980 --> 05:53.560]  And in 73 percent of states, more than 50 percent of counties have fewer than 30,000 active voters.
[05:53.560 --> 06:04.360]  That is, the median size of turnout in a county is 30,000 voters or fewer. In 92 percent of states,
[06:04.360 --> 06:09.760]  that number is 100,000. That is, more than 50 percent of counties have fewer than 100,000
[06:09.760 --> 06:18.580]  active voters. In 2019, only 317 U.S. cities had populations of 100,000 or more
[06:18.580 --> 06:25.680]  out of more than 19,000 incorporated places. So if about 80 percent of the population is a voting age
[06:25.680 --> 06:31.560]  and turnout is about 55 percent, which is roughly what it's been historically, then contests for
[06:31.560 --> 06:37.620]  elected officials in something like 98 percent of incorporated places involve fewer than 44,000
[06:37.620 --> 06:43.980]  voters. So we need to think about ways of testing things on contests that involve fewer than 44,000
[06:43.980 --> 06:50.320]  voters, and many contests will involve even, you know, fewer than 3,000 voters. The 2019
[06:51.900 --> 06:58.460]  median population of U.S. incorporated areas is about 725, so about 50 percent of incorporated
[06:58.460 --> 07:06.140]  places have a turnout of less than 320 voters. All right, this is just to give an idea how much
[07:06.140 --> 07:13.220]  of the country has a median turnout, had a median turnout in 2018 of less than 30,000 voters. It's
[07:13.220 --> 07:19.400]  most of the country by area. So what's Mallory's strategy space? How can Mallory figure out which
[07:19.400 --> 07:26.920]  transactions or what votes to try to alter? Mallory basically can pick based on a very large
[07:26.920 --> 07:32.880]  number of state variables in the ballot marking device, the time of day, how long the wait was
[07:32.880 --> 07:38.580]  between voters, how many people have voted on the machine already, how does this particular voter
[07:38.580 --> 07:42.820]  interact with the machine, including the selections, what contests the voter ignores,
[07:42.820 --> 07:48.120]  how many times the voter revises selections, how long the voter reviews things, whether the voter
[07:48.120 --> 07:55.180]  looks at every page of candidates in a contest, how long the voter reviews selections, inactivity
[07:55.180 --> 08:01.240]  warnings, BMD settings, font sizes, languages, whether the voter uses the audio interface,
[08:01.240 --> 08:06.000]  the sip and pop interface, all of these things are available to a maldoer, to Mallory, to try
[08:06.000 --> 08:12.300]  hack the election. Now here are some examples of just how many different possible voting
[08:12.300 --> 08:18.120]  transactions there are. I'm giving two columns of numbers. The more realistic number is pretty
[08:18.120 --> 08:26.040]  realistic in the United States. Many jurisdictions have ballots that contain 20 or more contests,
[08:26.040 --> 08:30.780]  but we're going to use three as basically a lower bound. And similarly, you can look at different
[08:30.780 --> 08:34.980]  variables that Mallory could use to target these things, from the number of candidates per contest,
[08:34.980 --> 08:42.680]  time of day, number of people that voted, time for selection, the settings that the voter uses,
[08:42.680 --> 08:48.560]  the contrast and saturation of the screen, font size, audio use, tempo, volume, and so on. So
[08:48.560 --> 08:52.920]  conservatively, there's on the order of six million different combinations of settings
[08:53.540 --> 08:59.820]  that are likely to have some reasonable probability of being used. More realistically,
[08:59.820 --> 09:05.560]  there's something over 10 to the 47th, a truly staggering number of possible voting transactions.
[09:05.560 --> 09:11.040]  There's no way to probe even a microscopic fraction of those using testing, either
[09:11.040 --> 09:18.500]  pre-election logic accuracy or live testing. So what can Pat do? That's what Mallory can do.
[09:18.500 --> 09:24.620]  Pat can monitor voter behavior in a non-invasive, non-privacy invading way, in particular can look
[09:24.620 --> 09:30.460]  for ballot rates, and Pat can try to catch a malfunction by using the BND before, during,
[09:30.460 --> 09:35.960]  or after an election. That is, doing logic and accuracy testing, or live testing, or post-mortem.
[09:35.960 --> 09:41.760]  So Pat really does have to test at random in some way. If Pat tests in a way that's
[09:41.760 --> 09:47.640]  predictable, such as once an hour, or pulls only one machine aside and tests it, or tests only some
[09:47.640 --> 09:53.060]  combinations of votes, only interacts with the machine in some particular way, then because
[09:53.060 --> 09:59.960]  Mallory knows what Pat's strategy is, Mallory can just avoid changing those transactions and hide.
[10:01.220 --> 10:05.800]  Similarly, Pat can't just set aside machines on election day for live testing. Pat needs to test
[10:05.800 --> 10:10.300]  the machines that are actually in use, or malware could detect that the machine is being used in a
[10:10.300 --> 10:16.900]  way that is not typical of voters. And moreover, because there are so many possible combinations
[10:16.900 --> 10:23.180]  of settings, combinations of ways of transacting a vote, uniform random sampling is doomed. You
[10:23.180 --> 10:28.840]  really do need to sample more often from those transactions that voters are going to use more
[10:28.840 --> 10:33.840]  often in order to have a reasonable chance of sampling at least once from any set of transactions
[10:33.840 --> 10:38.020]  that contain enough votes to alter the outcome of one or more contests in the election.
[10:38.540 --> 10:43.040]  So ideal sampling would mimic voter behavior. It would basically sample from what voters actually
[10:43.040 --> 10:49.960]  do. So we're going to look at exactly that. Suppose we could mimic voters perfectly, how many
[10:50.520 --> 10:54.420]  transactions would we actually need to use as tests in order to have a good chance of detecting
[10:54.420 --> 11:01.440]  outcome-changing errors or alterations. So it's important to know that in a jurisdiction-wide
[11:01.440 --> 11:07.980]  contest, changing the votes on 1% of transactions can typically change the margin by 2%, but if
[11:07.980 --> 11:12.860]  there's undervotes, you can change it by even more than that. And if the contest is only on a fraction
[11:12.860 --> 11:16.960]  of ballots cast in the election, then you don't need to change even that large of a percentage to
[11:16.960 --> 11:22.400]  change a margin by a much larger percentage. For instance, if you have a contest that's only on,
[11:22.400 --> 11:28.040]  that only 1 in 10 voters is eligible to vote in, and the undervote rate is 30%, then changing the
[11:28.040 --> 11:33.940]  votes on 1% of transactions could change the margin in that contest by 29%. That's a lot of leverage.
[11:34.940 --> 11:41.940]  So passive testing relies on voters noticing errors and spoiling their ballots. Now in order to
[11:41.940 --> 11:47.620]  know how large a spoilage rate is enough to sound an alarm, we have to have a good idea of how often
[11:47.620 --> 11:52.480]  voters spoil ballots when the machines are functioning correctly, and then we have to know
[11:52.480 --> 11:58.180]  how often they will notice errors if errors happen, and whether they will report
[11:58.180 --> 12:03.600]  those errors and thereby, you know, request a new ballot and trigger an alarm.
[12:03.660 --> 12:08.780]  Problem is that kind of training data is unlikely to be available in part because you can't step
[12:08.780 --> 12:13.280]  on the same election twice. There are all kinds of differences from election to election that are
[12:13.280 --> 12:17.260]  likely to change the spoiled ballot rate, including complexity of the ballot, ballot layout,
[12:18.240 --> 12:23.500]  complexity of the social choice functions, and so on. So how do we set a threshold for one to
[12:23.500 --> 12:29.140]  sound an alarm if we're using passive auditing, passive testing? It's going to depend in part
[12:29.140 --> 12:33.920]  on the number of transactions Mallory alters, which votes are affected, which contests are affected,
[12:33.920 --> 12:37.640]  and so on. And Pat is not going to know any of these things. Pat needs to test in a way that
[12:37.640 --> 12:42.920]  is going to be sensitive enough to changing the outcome of any contest whatsoever.
[12:43.600 --> 12:48.020]  So let's make some really optimistic assumptions and work through the numbers and figure out just
[12:48.020 --> 12:52.140]  how much testing Pat would have to do or how many voters would have to be voting in the particular
[12:52.140 --> 12:57.220]  contest so that a change in their spoiled ballot rate would be noticed. But let's assume in
[12:57.220 --> 13:01.880]  particular that the spoiled ballot rate follows a spoiled ballots follow a Poisson distribution
[13:01.880 --> 13:06.020]  with a known rate if there's no hacking and a different known rate if there is hacking. Now
[13:06.020 --> 13:10.480]  there's no reason to assume that except it's a common model for things. This is really just a
[13:10.480 --> 13:17.520]  thought experiment. It's not intended to be a realistic model of how voters detect and
[13:17.520 --> 13:22.840]  spoil ballots. We're going to look at contest margins of one to five percent and false positive
[13:22.840 --> 13:27.320]  and false negative rates of five percent and one percent. So a false positive rate is saying that
[13:27.440 --> 13:31.560]  there's a problem when there really isn't one. A false negative rate is failing to notice that
[13:31.560 --> 13:36.240]  there's a problem when in fact one or more outcomes have been altered. Now here's kind of
[13:36.240 --> 13:41.900]  how it plays out. This is for five percent rate of false negatives and false positives. If you look
[13:41.900 --> 13:49.520]  at this, going across the top is the base rate of spoiled ballots when things are clean and then
[13:49.520 --> 13:58.580]  as you go from row to row you're really looking at what is the rate of errors in the printouts
[13:58.580 --> 14:03.880]  that would be required to reverse a margin of size one percent, two percent, three percent,
[14:03.880 --> 14:08.420]  on to five percent. The detection rate we're assuming is either seven percent or twenty-five
[14:08.420 --> 14:13.980]  percent. Seven percent is consistent with what Bernard et al. found in their study of actual
[14:13.980 --> 14:20.580]  voters in an experiment that wasn't an actual election. So if you look at this to have a five
[14:20.580 --> 14:26.540]  percent false positive and false negative rate, you'd need on the order of half a million ballots
[14:26.540 --> 14:33.380]  or more for a realistic rate of voters detecting errors and spoiling their ballots to protect
[14:33.380 --> 14:38.540]  against altering a contest with a margin of one percent. That number goes down as the margin gets
[14:38.540 --> 14:43.420]  wider, but as I've already argued there are a very large number of contests that are decided by one
[14:43.420 --> 14:49.500]  percent or less, important contests. If we make a more stringent threshold of requiring only a one
[14:49.500 --> 14:53.520]  percent rate of false negatives or false positives, then we would need on the order of a million
[14:53.520 --> 14:58.680]  voters or more in the contest in order to be able to detect an alteration to one percent to
[14:59.340 --> 15:06.400]  to enough ballots to alter to reverse a margin of one percent. So let's think about how big this
[15:06.400 --> 15:11.220]  number half a million or a million is in the context of actual elections. And so let's I'm
[15:11.220 --> 15:16.880]  going to use California as an example. 41 of California's 58 counties had fewer than 100,000
[15:16.880 --> 15:22.300]  voters in the 2018 midterm election, so passive auditing would not have worked for any of those.
[15:22.300 --> 15:28.020]  33 had fewer than 100,000 voters in the 2016 presidential election, so again passive auditing
[15:28.020 --> 15:32.700]  would not have given you an acceptably low false positive and false negative rate, even under these
[15:32.700 --> 15:37.600]  optimistic assumptions that everything follows a Poisson distribution with a known rate.
[15:38.340 --> 15:43.040]  So passive testing couldn't have protected contests with margins of three percent or smaller
[15:43.040 --> 15:49.020]  in those jurisdictions that have a hundred thousand or fewer voters. In many California
[15:49.020 --> 15:53.420]  counties turn out so small that there would be no way to detect problems through spoilage rates
[15:53.420 --> 15:58.920]  without having an unacceptably high rate of false alarms would be that invalidating elections left
[15:58.920 --> 16:09.080]  and right. Okay, so that analysis assumed that votes were being changed more or less at random,
[16:09.080 --> 16:15.140]  that every voter had some chance of having his or her votes altered, but in fact Mallory has
[16:15.140 --> 16:20.500]  access to information to use to target the attack against voters who are less likely to notice
[16:20.500 --> 16:30.000]  problems or less likely to spoil a ballot if there is a problem. In particular, Mallory could target
[16:30.000 --> 16:38.380]  voters with visual impairments, voters who are blind, that could, if such voters, current ballot
[16:38.380 --> 16:43.320]  marking devices don't provide such voters a technology to check whether the printout actually
[16:43.320 --> 16:50.760]  matches the voters intentions or what the voter was told on the audio output or on the screen in
[16:50.760 --> 16:55.560]  you know in a larger font size. So if two percent of voters have a visual impairment that would
[16:55.560 --> 17:00.240]  prevent them from checking the printout directly themselves, then Mallory could change the outcomes
[17:00.240 --> 17:04.220]  of jurisdiction-wide contests that have margins of four percent or more without increasing the
[17:04.220 --> 17:08.760]  spoiled ballot rate at all, because those voters would have no opportunity to notice that there
[17:08.760 --> 17:16.460]  was a problem in the printout. Voters that have some kind of motor impairments that make it
[17:16.460 --> 17:21.840]  difficult for them to, that limited dexterity that it makes it difficult for them to handle a piece
[17:21.840 --> 17:27.480]  of paper, some ballot marking devices have an accessibility feature that allows a voter to
[17:28.280 --> 17:33.900]  cast the ballot without actually handling it. In some cases those features don't even print the
[17:33.900 --> 17:41.220]  until the voter has said I want you to cast this for me after you print it. Now because that doesn't
[17:41.220 --> 17:46.660]  even give the voter the opportunity to look at the piece of paper, the ballot marking device can cheat
[17:47.280 --> 17:52.360]  with impunity on ballots like that. So if there are enough voters who are using this autocast
[17:52.360 --> 17:59.700]  feature, then relatively wide margins can be altered without any possibility of detection.
[18:00.780 --> 18:07.200]  Languages other than English voters, if a voter is looking at a ballot on screen in one language
[18:07.200 --> 18:13.400]  but then printing it out in English, the voter might be less likely to check the printout if
[18:13.400 --> 18:21.020]  that's if the voter is not so comfortable in English. Moreover, if a voter who is a, you know,
[18:21.020 --> 18:27.420]  clearly a native foreign language speaker reports a problem with a ballot, it might be the case that
[18:27.420 --> 18:34.620]  some poll workers would be less likely to believe that the voter actually detected a
[18:34.620 --> 18:40.760]  problem with the device and rather more likely to believe that the voter made a mistake.
[18:40.760 --> 18:45.220]  There's all kinds of things that Mallory can monitor to try to target the attack by looking
[18:45.220 --> 18:48.620]  at how much attention the voter is paying, whether the voter is in a hurry, whether the
[18:48.620 --> 18:56.720]  voter is reviewing selections and so on. All right, so there are other problems with passive
[18:56.720 --> 19:02.320]  testing. Among them, it becomes a really easy way to raise a thud attack through uncertainty
[19:02.320 --> 19:07.080]  and doubt by simply asking voters to spoil their ballots more often than casting doubt on the
[19:07.080 --> 19:13.440]  election. All right, so let's look at these oracle bounds now. Suppose that instead of relying on
[19:13.440 --> 19:19.540]  voters to run the tests, we have to, we're going to try something like parallel testing, active
[19:19.540 --> 19:25.480]  testing, or logic and accuracy testing, where we input test patterns and look at what comes out.
[19:25.480 --> 19:33.800]  So if we had perfect knowledge of voter behavior, that is if basically Pat could pick voters at
[19:33.800 --> 19:38.380]  random and look over their shoulders while they vote and see whether the printout matched what
[19:38.380 --> 19:44.500]  the vote was presented on the screen or in the audio output, that's kind of a best case scenario
[19:44.500 --> 19:50.460]  that doesn't involve Pat having to figure out the distribution of voter transactions with
[19:50.460 --> 19:54.560]  the BMD. So even under those circumstances,
[19:57.500 --> 20:05.480]  it takes a fair number of votes in order to, you have to spy on a fair number of voters,
[20:05.480 --> 20:08.560]  have to look over the shoulder of a fair number of voters to have a reasonable chance of
[20:09.260 --> 20:16.040]  noticing a problem in the election. So for instance, if Mallory alters 15 transactions
[20:16.040 --> 20:21.000]  in a contest that has a little under 3,000 voters, which was the 2018 median jurisdiction
[20:21.820 --> 20:28.860]  turnout, that could change the outcome of the contest by one percent or more, but Pat would
[20:28.860 --> 20:34.380]  need to look over the shoulder of at least 540 voters, about 18 percent of the capacity of the
[20:34.380 --> 20:39.260]  machine. That would involve testing each ballot marking device several times an hour, but once
[20:39.260 --> 20:44.840]  an hour would not be enough. If you're limiting things to once an hour for 13 hours a day,
[20:44.840 --> 20:52.440]  to have a 95 percent chance of catching a problem, you have to have over 6,500 voters in
[20:52.440 --> 20:56.660]  the contest, which is almost triple the median turnout in jurisdictions in the U.S. and 20 times
[20:56.660 --> 21:02.320]  the median number of active voters in incorporated areas. In reality, Pat can't shoulder serve,
[21:02.320 --> 21:06.580]  Pat needs to make a model of voter behavior, and that's got to be calibrated to data.
[21:06.680 --> 21:12.220]  That's going to require monitoring voters in extreme detail, all of the details I mentioned
[21:12.220 --> 21:17.440]  before in relating to voting transactions. That would compromise voter privacy completely
[21:17.870 --> 21:24.240]  and probably be illegal. Nonetheless, let's imagine that Pat had a budget of running an
[21:24.240 --> 21:32.700]  infinite number of tests using a particular model. How many voters would Pat need to observe in order
[21:32.700 --> 21:37.920]  to get a model that was accurate enough to detect an alteration to some fraction of the votes?
[21:37.920 --> 21:43.180]  To have 99 percent confidence of detecting a change to a half percent of the votes,
[21:43.180 --> 21:48.100]  even if Pat could conduct an infinite number of tests in order to model behavior well enough to
[21:48.100 --> 21:54.420]  detect things at that confidence level, would involve monitoring three and three-quarter
[21:54.420 --> 21:59.680]  million voters in excruciating detail. Of course, their behavior in one election might not match
[21:59.680 --> 22:04.680]  their behavior in another election, and many jurisdictions that you need to be monitoring
[22:04.680 --> 22:10.200]  aren't big enough. They just simply don't have that many voters. As you look at larger margins
[22:10.200 --> 22:15.420]  and lower confidence levels, that bound goes down. But even from 95 percent confidence of
[22:15.420 --> 22:21.100]  detecting alterations to five percent of the votes, you'd have to observe more than a million
[22:21.100 --> 22:27.420]  voters. These are very, very conservative bounds. If you had a test limit of 2,000 tests to conduct,
[22:27.420 --> 22:31.840]  then the number of voters you would need to observe, you need to have an even more accurate
[22:31.840 --> 22:36.460]  model, you would have to observe even more voters. So the message here is you really have to look at
[22:36.460 --> 22:41.660]  monitor a million voters in excruciating detail, at least in order to have a reasonable chance of
[22:41.660 --> 22:47.200]  picking up outcome-changing errors. All right, so the situation is really not very good. I mean,
[22:47.200 --> 22:51.760]  it's actually worse than this. Even if you were able to do that much testing, if you find a
[22:51.760 --> 22:56.160]  problem, the only remedy is a new election. You have no idea which transactions were altered,
[22:56.160 --> 23:01.520]  what the right outcome should be. Margins aren't known before the testing happens. If it turns out
[23:01.520 --> 23:06.980]  that the margin is smaller than you allocated testing for while the election was going on,
[23:06.980 --> 23:13.100]  there's no way to go back and fix that. The tests themselves have uncertainty, and that means
[23:13.100 --> 23:17.420]  that you really need to factor that in in deciding who really won. Is there really strong
[23:17.420 --> 23:22.640]  enough evidence that someone really won? If there's only a 95% chance you would have seen a problem
[23:22.640 --> 23:29.960]  that big. Moreover, this is going to require new systems, extra hardware, additional staff, and
[23:29.960 --> 23:34.260]  additional training. This is a very expensive proposition, even if it could possibly be
[23:34.260 --> 23:41.980]  mounted. So our conclusion is that also, as I mentioned before, while BMDs are widely touted
[23:41.980 --> 23:48.060]  as helping some groups of voters, in fact BMDs pose an ideal vector for disenfranchising those
[23:48.060 --> 23:53.020]  very same groups of voters because they will interact with the machines in specific ways.
[23:53.700 --> 23:59.040]  In short, there doesn't really seem to be a way to rescue the trustworthiness of elections if
[23:59.040 --> 24:02.500]  you're casting most of the votes on ballot marking devices, or a substantial fraction
[24:02.500 --> 24:07.120]  of votes on ballot marking devices. And prudent election administration would involve minimizing
[24:07.120 --> 24:11.860]  the number of voters who use ballot marking devices, really reserving them for their
[24:11.860 --> 24:16.400]  accessibility benefits, where those accessibility benefits really do help particular groups of
[24:16.400 --> 24:18.800]  voters. Thank you very much for your attention.
